参数化量子电路的优化对于具有变分量子算法(VQAS)的计算任务的应用是必不可少的。然而,VQA的现有优化算法需要过多的量子测量镜头,以估计可观察到的期望值或迭代电路参数的更新,其成本是实际使用的重要障碍。为了解决这个问题,我们开发了一个有效的框架,\ yexit {随机梯度线贝叶斯优化}(SGLBO),用于电路优化,测量镜头较少。通过估计基于随机梯度下降(SGD)更新参数的适当方向,并且进一步利用贝叶斯优化(BO)来估计SGD的每次迭代中的最佳步长,降低测量镜头的成本。我们制定了一个自适应测量射击策略,可在不依赖于精确的期望值估计和许多迭代的情况下可行地实现优化;此外,我们表明,后缀平均技术可以显着降低统计和硬件噪声在VQA的优化中的效果。我们的数值模拟表明,使用这些技术增强的SGLBO可以大大减少所需的测量射击数量,提高优化的准确性,并与VQAS的代表性任务中的其他最先进的优化器相比,增强了噪音的鲁棒性。这些结果建立了一系列量子电路优化器的框架,整合了两种不同的优化方法,SGD和BO,以显着降低测量镜头的成本。
translated by 谷歌翻译
Hierarchical Reinforcement Learning (HRL) algorithms have been demonstrated to perform well on high-dimensional decision making and robotic control tasks. However, because they solely optimize for rewards, the agent tends to search the same space redundantly. This problem reduces the speed of learning and achieved reward. In this work, we present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration. The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level. The novelty of this work is the theoretical motivation of adding entropy to the RL objective in the HRL setting. We empirically show that the entropy can be added to both levels if the Kullback-Leibler (KL) divergence between consecutive updates of the low-level policy is sufficiently small. We performed an ablative study to analyze the effects of entropy on hierarchy, in which adding entropy to high-level emerged as the most desirable configuration. Furthermore, a higher temperature in the low-level leads to Q-value overestimation and increases the stochasticity of the environment that the high-level operates on, making learning more challenging. Our method, SHIRO, surpasses state-of-the-art performance on a range of simulated robotic control benchmark tasks and requires minimal tuning.
translated by 谷歌翻译